PoC faster LSMs with static calls/keys: ask for feedbacks#3
PoC faster LSMs with static calls/keys: ask for feedbacks#3PaulRenauld wants to merge 15 commits intoqueue-2020from
Conversation
create keys and add to static list minimum working poc wip cleanup
for-loop like struct slot get rid of macros use generic array of slots cleanup
bjackman
left a comment
There was a problem hiding this comment.
Nice progress! Added a few proper code review comments as well as general notes
With the addition of static_call_cond_int to the static call API, we don't need static keys anymore. A default value is provided, which is put into %eax before the call. This way, if the call is replaced by a nop, the default value will be in the place of the returned value. For now, this does not work when one of the arguments of the call is a function call (eax will be overwritten), hence the modification to security_mmap_file
20f16ad to
3b16261
Compare
3b16261 to
fa08676
Compare
PaulRenauld
left a comment
There was a problem hiding this comment.
Update: we got rid of the static call API extension. Instead, we use a switch statement.
The linked list on the LSM hooks are copied into the slots by lsm_init_hook_static_slot. This will fill the slots from first_slot to the last (e.g.: if you have 10 slots and 3 functions, they will be in slots 7,8,9).
To call the hooks, the switch statement jumps to the first used slot, and then falls through to the next one.
security/security.c
Outdated
| struct security_list_static_slots security_list_static_slots | ||
| __lsm_ro_after_init = { | ||
| #define DEFINE_SLOT(NUM, NAME) \ | ||
| (struct security_static_slot) { \ | ||
| .key = &STATIC_CALL_KEY(STATIC_SLOT(NAME, NUM)), \ | ||
| .tramp = &STATIC_CALL_TRAMP(STATIC_SLOT(NAME, NUM)) \ | ||
| }, | ||
| #define LSM_HOOK(RET, DEFAULT, NAME, ...) \ | ||
| .NAME = { \ | ||
| .slots = { \ | ||
| SECURITY_FOREACH_STATIC_SLOT(DEFINE_SLOT, NAME) \ | ||
| }, \ | ||
| .first_slot = INT_MAX, \ | ||
| .head = &security_hook_heads.NAME \ | ||
| }, | ||
| #include <linux/lsm_hook_defs.h> | ||
| #undef LSM_HOOK | ||
| #undef DEFINE_SLOT | ||
| }; |
There was a problem hiding this comment.
We could also get rid of this struct (only keep the first_slot for each hook) and then use the X-macro in lsm_init_static_slots. This would save memory that is used only during the init, but make the code heavier.
include/linux/lsm_hooks.h
Outdated
| * INT_MAX if no slot is used. | ||
| */ | ||
| int first_slot; | ||
| /* hlist corresponding to this hook */ |
There was a problem hiding this comment.
Just realised we should probably have a comment explaining why we still have the hlist as well as the static slots
security/security.c
Outdated
| int slot_idx, slot_cnt, first_slot; | ||
|
|
||
| slot_cnt = 0; | ||
| // todo: race condition if hlist modified in between the two foreach |
There was a problem hiding this comment.
I assume this is impossible since this is all happening during init, no?
There was a problem hiding this comment.
That's my understanding as well (I put the comment just to make sure). At least as long as we cannot just add and remove LSMs at abitrary time. The security_delete_hooks function might be annoying here.
security/security.c
Outdated
| } \ | ||
| } while (0); \ | ||
| RC; \ | ||
| #define CALL_STATIC_SLOT_INT(NUM, R, HOOK, ...) \ |
There was a problem hiding this comment.
This is a super unhygienic macro - I think that's unavoidable but let's put some __ at the beginning of the name so nobody mistakes it for a "proper" macro!
…he init the static calls
|
|
||
| first_slot = SECURITY_STATIC_SLOT_COUNT - slot_cnt; | ||
| if (first_slot < 0) | ||
| panic("%s - No static hook slot remaining to add LSM hook.\n", |
There was a problem hiding this comment.
Perhaps instead of panicking when this happens, you could have a fallback path that places a pointer to the normal security_* helper function in the first slot and NOPs out the rest?
There was a problem hiding this comment.
That's an interesting recovery. Right now the idea is that we should have as many static call for each hook as there are LSMs in the kernel, so this wouldn't happen.
include/linux/loop_unrolling.h
Outdated
| * MACRO(2, ...) | ||
| * ... | ||
| */ | ||
| #define M_LOOP_UNROLLING(N, MACRO, ...) \ |
There was a problem hiding this comment.
I think UNROLL_MACRO_LOOP would be a better name. Or just REPEAT_MACRO?
security/security.c
Outdated
| * Static slots are placeholders for potential LSM hooks. | ||
| * Instead of a costly indirect call, they use static calls. | ||
| */ | ||
| #define SECURITY_STATIC_SLOT_COUNT 3 |
There was a problem hiding this comment.
Let's also now increase this to be 16 or whatever
|
PR #4 separates this RFC into clean commits. |
…kernel/git/kvmarm/kvmarm into kvm-master KVM/arm fixes for 5.8, take #3 - Disable preemption on context-switching PMU EL0 state happening on system register trap - Don't clobber X0 when tearing down KVM via a soft reset (kexec)
The vfio_pci_release call will free and clear the error and request eventfd ctx while these ctx could be in use at the same time in the function like vfio_pci_request, and it's expected to protect them under the vdev->igate mutex, which is missing in vfio_pci_release. This issue is introduced since commit 1518ac2 ("vfio/pci: fix memory leaks of eventfd ctx"),and since commit 5c5866c ("vfio/pci: Clear error and request eventfd ctx after releasing"), it's very easily to trigger the kernel panic like this: [ 9513.904346] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [ 9513.913091] Mem abort info: [ 9513.915871] ESR = 0x96000006 [ 9513.918912] EC = 0x25: DABT (current EL), IL = 32 bits [ 9513.924198] SET = 0, FnV = 0 [ 9513.927238] EA = 0, S1PTW = 0 [ 9513.930364] Data abort info: [ 9513.933231] ISV = 0, ISS = 0x00000006 [ 9513.937048] CM = 0, WnR = 0 [ 9513.940003] user pgtable: 4k pages, 48-bit VAs, pgdp=0000007ec7d12000 [ 9513.946414] [0000000000000008] pgd=0000007ec7d13003, p4d=0000007ec7d13003, pud=0000007ec728c003, pmd=0000000000000000 [ 9513.956975] Internal error: Oops: 96000006 [#1] PREEMPT SMP [ 9513.962521] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio hclge hns3 hnae3 [last unloaded: vfio_pci] [ 9513.972998] CPU: 4 PID: 1327 Comm: bash Tainted: G W 5.8.0-rc4+ #3 [ 9513.980443] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V3.B270.01 05/08/2020 [ 9513.989274] pstate: 80400089 (Nzcv daIf +PAN -UAO BTYPE=--) [ 9513.994827] pc : _raw_spin_lock_irqsave+0x48/0x88 [ 9513.999515] lr : eventfd_signal+0x6c/0x1b0 [ 9514.003591] sp : ffff800038a0b960 [ 9514.006889] x29: ffff800038a0b960 x28: ffff007ef7f4da10 [ 9514.012175] x27: ffff207eefbbfc80 x26: ffffbb7903457000 [ 9514.017462] x25: ffffbb7912191000 x24: ffff007ef7f4d400 [ 9514.022747] x23: ffff20be6e0e4c00 x22: 0000000000000008 [ 9514.028033] x21: 0000000000000000 x20: 0000000000000000 [ 9514.033321] x19: 0000000000000008 x18: 0000000000000000 [ 9514.038606] x17: 0000000000000000 x16: ffffbb7910029328 [ 9514.043893] x15: 0000000000000000 x14: 0000000000000001 [ 9514.049179] x13: 0000000000000000 x12: 0000000000000002 [ 9514.054466] x11: 0000000000000000 x10: 0000000000000a00 [ 9514.059752] x9 : ffff800038a0b840 x8 : ffff007ef7f4de60 [ 9514.065038] x7 : ffff007fffc96690 x6 : fffffe01faffb748 [ 9514.070324] x5 : 0000000000000000 x4 : 0000000000000000 [ 9514.075609] x3 : 0000000000000000 x2 : 0000000000000001 [ 9514.080895] x1 : ffff007ef7f4d400 x0 : 0000000000000000 [ 9514.086181] Call trace: [ 9514.088618] _raw_spin_lock_irqsave+0x48/0x88 [ 9514.092954] eventfd_signal+0x6c/0x1b0 [ 9514.096691] vfio_pci_request+0x84/0xd0 [vfio_pci] [ 9514.101464] vfio_del_group_dev+0x150/0x290 [vfio] [ 9514.106234] vfio_pci_remove+0x30/0x128 [vfio_pci] [ 9514.111007] pci_device_remove+0x48/0x108 [ 9514.115001] device_release_driver_internal+0x100/0x1b8 [ 9514.120200] device_release_driver+0x28/0x38 [ 9514.124452] pci_stop_bus_device+0x68/0xa8 [ 9514.128528] pci_stop_and_remove_bus_device+0x20/0x38 [ 9514.133557] pci_iov_remove_virtfn+0xb4/0x128 [ 9514.137893] sriov_disable+0x3c/0x108 [ 9514.141538] pci_disable_sriov+0x28/0x38 [ 9514.145445] hns3_pci_sriov_configure+0x48/0xb8 [hns3] [ 9514.150558] sriov_numvfs_store+0x110/0x198 [ 9514.154724] dev_attr_store+0x44/0x60 [ 9514.158373] sysfs_kf_write+0x5c/0x78 [ 9514.162018] kernfs_fop_write+0x104/0x210 [ 9514.166010] __vfs_write+0x48/0x90 [ 9514.169395] vfs_write+0xbc/0x1c0 [ 9514.172694] ksys_write+0x74/0x100 [ 9514.176079] __arm64_sys_write+0x24/0x30 [ 9514.179987] el0_svc_common.constprop.4+0x110/0x200 [ 9514.184842] do_el0_svc+0x34/0x98 [ 9514.188144] el0_svc+0x14/0x40 [ 9514.191185] el0_sync_handler+0xb0/0x2d0 [ 9514.195088] el0_sync+0x140/0x180 [ 9514.198389] Code: b9001020 d2800000 52800022 f9800271 (885ffe61) [ 9514.204455] ---[ end trace 648de00c8406465f ]--- [ 9514.212308] note: bash[1327] exited with preempt_count 1 Cc: Qian Cai <cai@lca.pw> Cc: Alex Williamson <alex.williamson@redhat.com> Fixes: 1518ac2 ("vfio/pci: fix memory leaks of eventfd ctx") Signed-off-by: Zeng Tao <prime.zeng@hisilicon.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The `INSN_CONFIG` comedi instruction with sub-instruction code `INSN_CONFIG_DIGITAL_TRIG` includes a base channel in `data[3]`. This is used as a right shift amount for other bitmask values without being checked. Shift amounts greater than or equal to 32 will result in undefined behavior. Add code to deal with this. Fixes: 33cdce6 ("staging: comedi: addi_apci_1032: conform to new INSN_CONFIG_DIGITAL_TRIG") Cc: <stable@vger.kernel.org> #3.8+ Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Link: https://lore.kernel.org/r/20200717145257.112660-3-abbotti@mev.co.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The `INSN_CONFIG` comedi instruction with sub-instruction code `INSN_CONFIG_DIGITAL_TRIG` includes a base channel in `data[3]`. This is used as a right shift amount for other bitmask values without being checked. Shift amounts greater than or equal to 32 will result in undefined behavior. Add code to deal with this. Fixes: 1e15687 ("staging: comedi: addi_apci_1564: add Change-of-State interrupt subdevice and required functions") Cc: <stable@vger.kernel.org> #3.17+ Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Link: https://lore.kernel.org/r/20200717145257.112660-4-abbotti@mev.co.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stalls are quite frequent with recent kernels. I enabled CONFIG_SOFTLOCKUP_DETECTOR and I caught the following stall: watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [cc1:22803] CPU: 0 PID: 22803 Comm: cc1 Not tainted 5.6.17+ #3 Hardware name: 9000/800/rp3440 IAOQ[0]: d_alloc_parallel+0x384/0x688 IAOQ[1]: d_alloc_parallel+0x388/0x688 RP(r2): d_alloc_parallel+0x134/0x688 Backtrace: [<000000004036974c>] __lookup_slow+0xa4/0x200 [<0000000040369fc8>] walk_component+0x288/0x458 [<000000004036a9a0>] path_lookupat+0x88/0x198 [<000000004036e748>] filename_lookup+0xa0/0x168 [<000000004036e95c>] user_path_at_empty+0x64/0x80 [<000000004035d93c>] vfs_statx+0x104/0x158 [<000000004035dfcc>] __do_sys_lstat64+0x44/0x80 [<000000004035e5a0>] sys_lstat64+0x20/0x38 [<0000000040180054>] syscall_exit+0x0/0x14 The code was stuck in this loop in d_alloc_parallel: 4037d414: 0e 00 10 dc ldd 0(r16),ret0 4037d418: c7 fc 5f ed bb,< ret0,1f,4037d414 <d_alloc_parallel+0x384> 4037d41c: 08 00 02 40 nop This is the inner loop of bit_spin_lock which is called by hlist_bl_unlock in d_alloc_parallel: static inline void bit_spin_lock(int bitnum, unsigned long *addr) { /* * Assuming the lock is uncontended, this never enters * the body of the outer loop. If it is contended, then * within the inner loop a non-atomic test is used to * busywait with less bus contention for a good time to * attempt to acquire the lock bit. */ preempt_disable(); #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) while (unlikely(test_and_set_bit_lock(bitnum, addr))) { preempt_enable(); do { cpu_relax(); } while (test_bit(bitnum, addr)); preempt_disable(); } #endif __acquire(bitlock); } After consideration, I realized that we must be losing bit unlocks. Then, I noticed that we missed defining atomic64_set_release(). Adding this define fixes the stalls in bit operations. Signed-off-by: Dave Anglin <dave.anglin@bell.net> Cc: stable@vger.kernel.org Signed-off-by: Helge Deller <deller@gmx.de>
Ido Schimmel says: ==================== mlxsw fixes This patch set contains various fixes for mlxsw. Patches #1-#2 fix two trap related issues introduced in previous cycle. Patches #3-#5 fix rare use-after-frees discovered by syzkaller. After over a week of fuzzing with the fixes, the bugs did not reproduce. Patch #6 from Amit fixes an issue in the ethtool selftest that was recently discovered after running the test on a new platform that supports only 1Gbps and 10Gbps speeds. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
…kernel/git/kvmarm/kvmarm into kvm-master KVM/arm64 fixes for Linux 5.8, take #3 - Fix a corner case of a new mapping inheriting exec permission without and yet bypassing invalidation of the I-cache - Make sure PtrAuth predicates oinly generate inline code for the non-VHE hypervisor code
I compiled with AddressSanitizer and I had these memory leaks while I
was using the tep_parse_format function:
Direct leak of 28 byte(s) in 4 object(s) allocated from:
#0 0x7fb07db49ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe)
#1 0x7fb07a724228 in extend_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:985
#2 0x7fb07a724c21 in __read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1140
#3 0x7fb07a724f78 in read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1206
#4 0x7fb07a725191 in __read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1291
#5 0x7fb07a7251df in read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1299
#6 0x7fb07a72e6c8 in process_dynamic_array_len /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:2849
#7 0x7fb07a7304b8 in process_function /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3161
#8 0x7fb07a730900 in process_arg_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3207
#9 0x7fb07a727c0b in process_arg /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1786
#10 0x7fb07a731080 in event_read_print_args /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3285
#11 0x7fb07a731722 in event_read_print /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3369
#12 0x7fb07a740054 in __tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6335
#13 0x7fb07a74047a in __parse_event /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6389
#14 0x7fb07a740536 in tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6431
#15 0x7fb07a785acf in parse_event ../../../src/fs-src/fs.c:251
#16 0x7fb07a785ccd in parse_systems ../../../src/fs-src/fs.c:284
#17 0x7fb07a786fb3 in read_metadata ../../../src/fs-src/fs.c:593
#18 0x7fb07a78760e in ftrace_fs_source_init ../../../src/fs-src/fs.c:727
#19 0x7fb07d90c19c in add_component_with_init_method_data ../../../../src/lib/graph/graph.c:1048
#20 0x7fb07d90c87b in add_source_component_with_initialize_method_data ../../../../src/lib/graph/graph.c:1127
#21 0x7fb07d90c92a in bt_graph_add_source_component ../../../../src/lib/graph/graph.c:1152
#22 0x55db11aa632e in cmd_run_ctx_create_components_from_config_components ../../../src/cli/babeltrace2.c:2252
#23 0x55db11aa6fda in cmd_run_ctx_create_components ../../../src/cli/babeltrace2.c:2347
#24 0x55db11aa780c in cmd_run ../../../src/cli/babeltrace2.c:2461
#25 0x55db11aa8a7d in main ../../../src/cli/babeltrace2.c:2673
#26 0x7fb07d5460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
The token variable in the process_dynamic_array_len function is
allocated in the read_expect_type function, but is not freed before
calling the read_token function.
Free the token variable before calling read_token in order to plug the
leak.
Signed-off-by: Philippe Duplessis-Guindon <pduplessis@efficios.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/linux-trace-devel/20200730150236.5392-1-pduplessis@efficios.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Copied from #2 (rebased after the static call branch on the queue was updated to contain the most recent changes on master):
This patch is to ask for comments and suggestions.
This uses static keys and static calls to get rid of the indirect calls for the LSMs.
For each LSM hook, there are 3 "slots" (the number can be changed). A slot corresponds to a static key, which indicates if this slot is used or not, and a static call, which would contain the function to call if the slot is used.
For example, for the hook
file_permission, we will define:Although the static key API provides an array definition, it is not the case for static calls, and it would be non-trivial to add it (code here). It makes it hard to provide a cleaner structure and we cannot use a for-loop to access the slots. Instead, we use a macro
FOR_EACH_HOOK_SLOT(M, ...)which will expand.We also need to keep in mind that not all functions use
call_int_hook/call_void_hook. Some access the list directly. Therefore we still need to add the hooks to the linked list. Right now, we cannot have more LSMs on a specific hook than 3. If we want to remove the limit, we could add all the LSM to the hook list, but keep a pointer to the 3rd element. Then, incall_int_hook/call_void_hook, keep the iteration on the list, but start at index 3. This way, the first three LSMs are optimized, and the other have the same cost than before.Tasks specific to the LSMs:
Tasks for the static call API: